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Abstract:  In  the  process  of  electromagnetic  modeling  and  simulation,  one  encounters  various 
limits  imposed  by  the  hardware  capabilities  of  modern  computers.  As  the  complexity  or  the 
electrical  size  of  the  problem  grows,  so  does  the  need  for  faster  processors  and  more  RAM  in 
order  to  make  the  analysis  of  such  projects  feasible.  With  the  era  of  64  bit  computing  at  our  door 
step,  4GB  is  no  longer  the  theoretical  maximum  addressable  memory  space  on  PC  computers, 
which  allows  the  analysis  of  demanding  electromagnetic  problems  on  every  desktop.  In  this 
paper,  several  tests  have  been  presented  regarding  the  analysis  of  a  cube  of  dimensions  up  to  30A,  x 
30X  x  30L  Significant  advancements  in  modeling  and  analysis  of  electrically  large  structures  in 
WIPL-D  Pro  code  are  the  main  focus.  Tests  include:  running  WIPL-D  Pro  code  in  the  Windows 
and  Linux  64-bit  environments,  employing  2  processors  in  parallel  and  speed  comparisons 
between  the  latest  and  previous  versions  of  the  code. 
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1.  Introduction 

With  the  advent  of  64  bit  computing,  the  painful  limit  of  the  addressable  memory  space  of 
personal  computers  (4  GB  thereotically,  but  in  practice  2-3  GB),  was  eliminated.  Hence,  our 
perception  of  electrically  large  problems  shifts  slowly  upwards,  and  so  do  the  demands  that 
engineers  set  to  EM  simulation  tools  running  on  their  PCs.  The  advancements  recently  made  in  the 
WIPL-D  Pro  3D  electromagnetic  solver  are  illustrated  in  this  paper  on  the  example  of  a  metallic 
cube  scatterer  with  a  surface  of  up  to  5400  X2. 

In  Section  2,  the  simulation  setup  is  explained  in  detail. 

In  Section  3,  a  speed  comparison  between  versions  4.1  and  5.1  of  WIPL-D  Pro  is  given  in  order  to 
illustrate  the  performance  of  the  new,  redesigned  LU  decomposition  algorithm.  The  simulated 
project  is  a  cube  scatterer  with  a  side  varying  from  3^  to  18X. 

In  Section  4,  the  cube  scatterer  is  analyzed  on  SuSE  Linux  system  running  on  AMD  Opteron  64- 
bit  processor  using  8GB  of  RAM.  The  analysis  is  performed  up  to  30,000  unknowns  which  is 
enough  to  represent  a  5400  X2  cube  using  2  symmetry  planes.  This  is  followed  up  in  Section  5,  by 
a  speed  comparison  between  simulations  performed  on  Windows  XP  and  Linux  running  on  the 
same  hardware. 

In  Section  6,  parallel  run  of  multiple  WIPL-D  projects  was  investigated  on  Windows,  running  on  a 
dual  processor  Opteron  system. 

Conclusions  are  deduced  based  on  acquired  results. 

2.  Simulation  Setup 

The  examined  metallic  cube  scatterer  is  shown  in  Figure  1.  The  model  was  created  in  WIPL-D  Pro 
5.1,  using  2  symmetry  planes.  The  electrical  size  of  patch  length  was  kept  constant  at  1.5A, 
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throughout  the  simulation,  while  the  electrical  size  of  the  cube  as  a  whole  was  increased  by 
increasing  the  frequency  and  the  number  of  metallic  patches  per  cube  side. 

The  cube  scatterer  was  excited  by  a  linearly  polarized  plane  wave  in  the  cross-section  of  the  two 
symmetry  planes,  orthogonal  to  one  of  the  sides  of  the  cube,  as  shown  in  Figure  1. 

The  RCS  in  all  projects  was  calculated  in  the  incident  plane,  in  1801  directions,  from  Theta=0°  to 
Theta=180°.  One  quarter  of  the  RCS  diagram  in  case  of  the  largest  cube  scatterer  with  the  side  of 
30A,  is  shown  in  Figure  2. 

3.  Faster  Linear  System  Solution  in  WIPL-D  Pro  5.1 

When  method  of  moments  is  applied  in  electromagnetic  simulation,  the  linear  operator  equation  is 
reduced  to  a  system  of  linear  equations.  Thus,  the  analysis  duration  consists  of  the  system  matrix 
fill-in  time,  the  time  needed  to  solve  the  linear  system  and  the  data  post-processing  time. 

Disregarding  the  data  post-processing  time,  the  analysis  duration  can  be  described  as: 


T  =  AN 2 


+  —  BNi 
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(1) 


where  A  is  the  fill-in  time  of  a  single  matrix  element,  B  is  the  time  needed  to  perform  one  basic 
operation  (one  addition  and  one  multiplication)  while  solving  the  linear  system,  and  N  is  the 
number  of  unknowns. 

With  electrically  large  structures,  the  largest  amount  of  time  is  spent  on  LU  decomposition,  which 
is  most  often  applied  in  order  to  solve  the  system  of  linear  equations.  Thus,  this  part  of  the 
analysis  is  the  most  interesting  one  to  be  optimized  in  order  to  reduce  total  analysis  duration. 

From  version  5.1  onwards,  the  LU  decomposition  performed  in  WIPL-D  Pro  has  been  redesigned. 
The  algorithm  has  been  optimized  for  faster  memory  usage  and  computation.  This  resulted  in 
significant  speed  increase  for  projects  with  medium  and  large  number  of  unknowns.  A  diagram 
illustrating  the  acceleration  of  version  5.1  compared  to  the  version  4.1  is  presented  in  Figure  3. 
The  acceleration  is  obtained  by  simply  dividing  the  duration  of  LU  decomposition  in  v4.1  by  the 
duration  of  LU  decomposition  in  v5.1  for  the  cube  scatterer  project  described  in  the  previous 
section.  The  acceleration  is  about  50%  lower  if  it  is  calculated  taking  total  analysis  duration  into 
account,  including  matrix  fill-in  time  and  data  post-processing. 

4.  Cube  Scatterer  Benchmark  -  up  to  5400  L2 

Let  us  now  observe  a  practical  example  and  calculate  the  values  A  and  B  from  (1)  based  on  the 
experimental  results.  Ten  cube  scatterers  were  analyzed  on  SuSE  Linux  operative  system  running 
on  a  1.4  GHz  AMD  Opteron  processor  with  8  GB  of  RAM  and  bus  clock  rate  of  133  MHz.  The 
diagram  in  Figure  4  was  obtained  by  increasing  the  electrical  size  of  the  cube,  and  thus  increasing 
the  number  of  unknowns,  and  measuring  the  duration  of  the  analysis.  It  displays  the  total  analysis 
duration  of  a  cube  scatterer  versus  the  number  of  unknowns.  Sides  of  the  corresponding  cubes  are 
from  3X  at  300  unknowns  up  to  30A,  at  30,000  unknowns. 

The  values  for  A  and  B  calculated  based  on  the  cube  scatterer  benchmark  are  given  in  Table  1.  If 
we  disregard  the  values  corresponding  to  the  first  project,  with  300  unknowns,  results  for  the 
values  A  and  B  are  a  stable  estimation. 

5.  Speed  Comparison  Between  Linux  and  Windows 

Cube  scatterer  projects  of  up  to  15,000  unknowns  were  run  on  WIPL-D  Pro  v5.1  on  SuSE  Linux 
9.0  and  Windows  XP  64  on  the  same  hardware  platform:  AMD  Opteron  at  1.4GHz  with  8  GB  of 
RAM,  bus  clock  rate  133  MHz.  The  results  are  given  in  Figure  5. 

The  Windows  version  of  WIPL-D  Pro  5.1  performs  better  than  the  Linux  version  for  about  25%  at 
2700  unknowns  and  about  14%  at  14,700  unknowns.  The  possible  reasons  for  differences  in  speed 
are  numerous,  one  of  them  surely  being  the  different  compilers  used  to  build  applications  on  two 
operative  systems.  One  other  possible  reason  is  different  memory  management  on  these  two 
systems. 


6.  Running  Multiple  WIPL-D  Projects  in  Parallel 

Employing  more  than  one  processor  at  a  time  allows  significant  speed  up  of  the  electromagnetic 
modeling  and  simulation  process.  On  dual  processor  systems,  the  time  needed  for  the  analysis  is 
almost  halved. 

Running  multiple  WIPL-D  projects  on  more  than  one  processor  was  tested  on  a  system  with  two 
AMD  Opteron  processors  at  1.4  GHz,  using  8  GB  of  RAM.  Since  WIPL-D  Pro  code  is  not 
parallelized,  multiple  projects  were  run  at  the  same  time  as  part  of  a  batch  procedure,  thus  letting 
the  operative  system  take  on  the  task  of  employing  both  processors.  Each  batch  procedure 
consisted  of  30  identical  metallic  cube  scatterers  in  order  to  establish  the  exact  speed  increase 
compared  to  the  execution  of  a  single  project. 

The  results  are  displayed  in  Figure  6.  The  acceleration  is  calculated  by  dividing  the  time  needed  to 
execute  30  projects  employing  only  one  processor  with  the  time  needed  to  execute  those  30 
projects  employing  both  processors.  The  number  of  tasks  simultaneosly  assigned  to  a  dual 
processor  system  is  denoted  with  x.  From  the  results  we  can  observe  that  the  acceleration  is  less 
than  2,  which  is  expected  since  both  processors  use  the  same  memory.  It  is  also  apparent  that  if  we 
overload  the  two  processors  by  simultaneously  assigning  4  tasks  with  a  relatively  small  number  of 
unknowns  to  them,  we  achieve  a  small  performance  increase,  compared  to  the  case  when  only  two 
simultaneous  tasks  are  assigned.  As  the  size  of  the  tasks  becomes  larger,  the  performance 
deteorirates  and  converges  to  the  case  of  not- overloaded  processors. 

7.  Conclusions 

The  performance  of  WIPL-D  Pro  code  in  analysis  of  electrically  large  structures  was  investigated 
in  this  paper.  A  metallic  cube  scatterer  was  taken  as  a  benchmark  project. 

The  calculations  were  performed  with  WIPL-D  Pro  5.1,  featuring  the  redesigned  LU 
decomposition  routine  which  increased  its  speed  significantly  compared  to  the  previous  versions. 
In  case  of  the  metallic  cube  scatterer  the  acceleration  was  about  18  times  at  15,000  unknowns  (a 
cube  with  a  12X  side),  taking  into  account  all  the  segments  of  the  analysis.  This  increase  was  even 
greater,  it  went  up  to  30  times,  when  just  the  LU  decomposition  part  of  the  analysis  was  observed. 

The  metallic  cube  scatterer  of  a  side  from  3X  to  30A,  was  analyzed  on  a  64  bit  system, 
demonstrating  the  breaking  of  the  2  GB  memory  limit.  The  analysis  time  for  the  cube  of 
dimensions  30A,  x  30X  x  30A,,  on  a  1.4  GHz  Opteron  processor  was  about  14  hours. 

Multiple  projects  were  run  in  parallel  on  a  two-processor  system.  Speed  increase  close  to  2  times 
was  demonstrated. 
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Total  analysis 
duration  [sec] 

Matrix  fill- 
in  time  [sec] 

Data  post¬ 
processing  time 
[sec] 

Matrix 

inversion  time 
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A  [micro 
sec] 

B  [nano 
sec] 

300 

2.6 

0.5 

2.0 

0.1 

5.56 

11.11 

1200 

19.6 

7.6 

8.7 

3.3 

5.28 

5.73 

2700 

94.3 

37.8 

20.5 

36.0 

5.19 

5.49 

4800 

371.3 

124.0 

36.5 

210.8 

5.38 

5.72 

1143.9 

298.1 

57.2 

788.6 

5.30 

5.61 

3099.5 

612.2 

83.7 

2403.6 

5.25 

5.72 

7398.4 

1147.4 

113.9 

6137.1 

5.31 

5.80 

15558.6 

1932.0 

150.9 

13475.7 

5.24 

5.71 

24300 

32304.5 

3080.3 

192.3 

29031.9 

5.22 

6.07 

30000 

57546.8 

4666.9 

239.7 

52640.2 

5.19 

5.85 

Table  1 :  Values  A  and  B  calculated  based  on  cube  scatterer  benchmark 


Figure  1:  Cube  scatterer  modeled  in  the 
program  WIPL-D  Pro  5.1 


Figure  2:  RCS  of  the  cube  scatterer  with  the 
side  of  30^ 
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Figure  3:  Acceleration  of  WIPL-D  Pro  v5.1 
compared  to  v4.1  -  LU  decomposition 


Figure  4:  Cube  scatterer  benchmark  up  to 
30,000  unknowns 


Figure  5:  Speed  comparison  between  Linux  Figure  6:  Acceleration  in  the  case  of  dual 

and  Windows  versions  of  WIPL-D  Pro  5.1  processor  run;  x  -  number  of  tasks 

simultaneously  assigned 
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